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ABSTRACT 


Classification of satellite ship imagery is an active topic of research, and multiple 
types of classifiers have been considered over the years. This study explores the viability 
of the random forest algorithm in vessel type classification and compares performance to 
that obtained in earlier work by Rainey et al., published in a 2012 SPIE Proceedings 
article, and by Parameswaran and Rainey, published in a 2015 SPIE Proceedings article. 
Random forest is advantageous due to its relative ease of use, resistance to overfitting, 
and built-in model validation. Results indicate that random forest performance is 
comparable to or better than time-tested machine learning methods, such as support 
vector machines, when applied to preprocessed vessel images. Feature extractors that 
capture spatial information yielded highest accuracies. Previous work has indicated that 
the visual bag of words (VBOW) representation is flexible and effective in feature coding 
the vessel images. Therefore, in this work various weighting schemes augmented the 
VBOW, which was evaluated on both original and preprocessed vessel image datasets as 


input to the random forest, with limited success. 
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I. INTRODUCTION 


The U.S. Navy has an abiding interest in technologies and processes that enhance 
Maritime Domain Awareness, defined by the International Maritime Organization as “the 
effective understanding of any activity associated with the maritime domain that could 
impact upon the security, safety, economy, or environment” [1, p. 1]. One key component 
of Maritime Domain Awareness is vessel classification and identification. A major source 
of surveillance is satellite imagery, which is becoming more abundant every day. To 
efficiently classify vessels captured in satellite images, robust machine learning algorithms 
are needed. The process for training these machine learning algorithms is shown in 
Figure 1. Supervised learning algorithms used in image classification are trained on 
labelled image datasets. Images then undergo a variety of levels of pre-processing, to 
include rotation, cropping, alignment, resizing, and normalization. Pre-processing may be 
necessary so that certain feature extractors can be applied [2]. The feature extractors reduce 
the dimensionality of the dataset and extract useful and discriminative attributes of the 
images [3], such as corners, blobs, and edges. The feature vectors of the images and 


corresponding labels then are used to build and test the classifier algorithm. 





| | Feature | 


Image | Image E Classifier 
al : xtractors and —+ : 
| Dataset | Preprocessing Descriptors | Algorithm 


' 


Figure 1. Block diagram of process for training classifiers 


One of the challenges of vessel type classification using satellite imagery is that the 
appearance of vessels in satellite imagery largely depends on lighting conditions, viewing 
angle, weather and sea state [2]. Additionally, vessels of the same class may exhibit wide 


variation in features [2]. 


A potential candidate for vessel classification is the random forest algorithm. It is 


incredibly versatile, with applications in far-ranging fields including finance and 


1 


medicine [4]. More recently, it has become a popular classifier in remote sensing due to its 
high accuracy and computational efficiency [5]. The random forest has shown success in 
remote sensing applications such as identifying tree health, mapping of oil spills, and 
classifying insect defoliation levels [5]. The algorithm requires few hyper-parameters, and 
can quickly process high dimensional and heterogeneous data. The goal of this research is 
to explore the efficacy of the random forest algorithm in vessel classification using satellite 


imagery. 


A. DATASET 


The data used to train and assess the random forest is BCCT-200, which is 
composed of grey-scale satellite images of four classes of vessels—barges, cargo ships, 
container ships, and tankers—consisting of 200 vessel images per type. This dataset was 
compiled using the RAPid Image Exploitation Resource (RAPIER®), developed by Space 
and Naval Warfare (SPAWAR) Systems Center Pacific [2]. RAPIER detects ships in 
satellite imagery and from unmanned aerial system video [2]. This study analyzes two 


subsets of the BCCT-200: 


e BCCT-200_orig: No pre-processing is applied to the original images. 
These images are various sizes and show vessels in various orientations. 


They are hereafter referred to as original images. 


e BCCT-200_resize: These images have been rotated, cropped, aligned, and 
resized to 300 x 150 pixels [2]. They are hereafter referred to as resized 


images. 


Examples of original and resized images are shown in Figure 2 and Figure 3, respectively. 





Barge Cargo Container Tanker 


Figure 2. Original images. Source: [2]. 





Barge Cargo Container Tanker 


Figure 3. Resized images. Source: [2]. 


The quality of the image dataset has great influence on the effectiveness of the 
classifier algorithm [6], so it is of great importance to understand the limitations of BCCT- 
200. Image metadata such as sensor type, location, resolution, and sensor geometry were 
not kept when the dataset was compiled [6]. The vessel types were assigned by a layman 
with limited review [6]. BCCT-200 is a balanced dataset, with equal numbers of images 
for each class and a relatively large number of samples, but military interest is often in rare 
vessel classes [6]. However, BCCT-200 has served as a common benchmark for various 


ship classification algorithms and techniques. 
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B. PREVIOUS WORK 


Rainey et al. conducted a broad survey to study the efficacy of several feature 
extractor algorithms and classifiers when applied to BCCT-200 [2]. The feature extractors 
used with resized images included Principal Component Analysis (PCA), Random 
Projection, Hierarchical Multiscale Local Binary Patterns (HMLBP), and Histogram of 
Oriented Gradients (HOG) [2]. The classifiers used were support vector machine (SVM), 
Sparse Representation-based Classification (SRC), and Nearest neighbor [2]. The only 
feature representation that was applied to the original images was visual bag of words 
(VBOW) clustered from scale invariant feature transform (SIFT) feature descriptors [2]. 
Parameswaran and Rainey conducted a follow-on study focused on the effect of term 
weighting to the VBOW due to evidence of classification performance improvement in the 
natural language processing field [3]. The VBOWs were built using the original images 
and several weights were applied to VBOWs of varying vocabulary sizes prior to being fed 
into the SVM classifier [3]. Rainey, Reeder and Corelli then studied the efficacy of a 
convolutional neural networks (CNN) in identifying a single class of ship among images 


of both various ship classes and non-ship images such as clouds and glints [7]. 


Two studies showed the performance improvement achieved through the Multiple 
Features Learning (MFL) framework when applied to the BCCT-200 resized and original 
images. The MFL combines features extracted from different algorithms to capture both 
local and global features. Huang et al. used three types of features: Gabor-based Multiscale 
Completed Local Binary Pattern (MS-CLBP), patch-based MS-CLBP and Fisher vector, 
and Spatial Pyramid Matching (SPM) augmented VBOW [8]. Shi et al. used a MFL of 
two-dimensional discrete fractional Fourier transform, Completed Local Binary Pattern 


(CLBP), and Gabor filter [9]. The MFL was fed into a deep CNN for classification [9]. 


Two studies have shown the efficacy of the random forest for vessel classification 
using Automated Information System data as features. Zhong, Song and Yang used vessel 
perimeter, area, ratio, and shape to classify cargo ships, tankers, and fishing vessels [10]. 
Snapir, Waine and Biermann used the vessel longitude, latitude, length, closest distance to 


shore, and time to distinguish between fishing and non-fishing vessels [11]. 


In Chapter II we introduce the theory and methodology of the feature extractors and 
classifiers used in this research. Next, in Chapter III, we discuss the parameters and 
implementation of the experiments. In Chapter IV we report the experimental results and 


in Chapter V we present our conclusions and recommendations for future work. 
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I. BACKGROUND 


Image classification performances heavily rely on class features best suited to 
separate classes and the classification models considered in the classification tasks. In this 
chapter, we first introduce the feature extractors, HOG and LBP, used on the ship image 
database considered in the study [1]. Next, we introduce the visual bag of words (VBOW) 
and the algorithms selected to populate the vocabulary of visual words, the scale invariant 
feature transform (SIFT) in [3] and the speeded-up robust feature (SURF) in this study. 
Finally, we present a brief overview of the decision tree, random forest, and support vector 


machine models considered in the study. 
A. FEATURE EXTRACTORS AND DESCRIPTORS 


1. Histogram of Oriented Gradients 


Histogram of oriented gradients is an edge detector, capturing the distribution of 
local intensity gradients in an image [12]. It was originally designed to facilitate human 
detection [12], and has since been used in a wide variety of image classification subjects. 


The process of generating the HOG feature vector is as follows. 


The image is divided into cells of a given size. The horizontal (g,) and vertical 
(g,,) gradients are determined by filtering the image with 1-dimensional centered 
derivative masks [—1,0,1] and [-1,0,1]' [12]. The gradient magnitude (g) and direction 


(@) are calculated respectively as 
= 8.48, (1) 
and 


0= arctan(£2). (2) 


x 


If a gradient direction is negative, 180° is added or subtracted, resulting in unsigned 


gradients. In each cell, the magnitude of the unsigned gradients are then sorted by direction 


in 9 bins ranging from 0°—180° to form a 9 x | histogram [13]. Note that the gradient 
magnitude is distributed between bins by interpolation when the gradient direction lies 
between two bins. The binning and interpolation process of an 8x8 cell is illustrated in 


Figure 4. 
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Histogram of Gradients 


Figure 4. Creating histogram of 8 x 8 cell. Source: [13]. 


An overlay of the histograms in polar form [14] obtained from an image of a 
container ship is illustrated in Figure 5. The angle of the lines depicted in the right-hand 
image correspond to the bin and the line-length corresponds to the sum of gradient 


magnitudes. 





Figure 5. 9-bins of histograms illustrated and labeled 


The image is then divided into blocks of 2 x 2 cells, which overlap each other by 2 
cells horizontally and vertically. The four histograms in each block are concatenated to 
form a 36 x | vector. The 36 x | vector is then normalized by dividing every element by 


the L, norm of the vector [12]. The normalization takes into account local changes in 


illumination and contrast in the image [12]. The normalized 36 x 1 vectors of each block 


are then concatenated row by row to form the final feature vector of the image [13]. 


2. Local Binary Pattern 


The local binary pattern was designed to capture local textures of an image [15], 
1.e., the spatial arrangement of pixel intensities [16]. LBP has shown outstanding results in 
facial analysis, and is useful in applications such as video conferencing and tracking and 


identifying people [17]. The process of generating the LBP feature vector is as follows. 
The image is divided into cells of a given size, and in each cell, the grayscale 

intensity of every pixel is compared to the intensities of surrounding pixels. The sample 

point (g,,) is assigned | if its grayscale value is greater than the center pixel (g,), zero 


otherwise [18]. The number of sampling points P on a circle of radius R for each 


comparison are selected by the user. The gray value is interpolated from surrounding pixels 


if the point is not in a center of a pixel [15]. The comparison yields a cyclical binary code, 


as illustrated in Figure 6. 





Binary Code: 
11110000 


Figure 6. 8-bit binary code for center pixel. Adapted from [19]. 


In this feature descriptor, binary codes are classified as either uniform or non- 
uniform. The binary code for a pixel is considered uniform if there are two or less bit 
transitions, non-uniform otherwise [20]. In [21], it was shown that the uniform binary codes 
correspond to distinguishing features in images, such as spots, edges, line ends, and 


corners, as depicted in Figure 7. 


a3 oe EP 


Spot Spot / Flat Line end Edge Corner 
Black and white circles correspond to bit values of 0 and 1 respectively. 


Figure 7. Uniform patterns and associated features. Source: [22]. 


The binary codes are then converted to decimal values. A histogram of the decimal 
values is computed for each cell. To keep only distinguishing features, all non-uniform 
codes are grouped into one bin, regardless of decimal value. The number of bins in each 


histogram is calculated in [23] as 


P(P-1)+3. (3) 


10 


Each histogram is then normalized by its L, norm and are then concatenated row 


by row to form the final feature vector. 


3. Scale Invariant Feature Transform 


The scale invariant feature transform is widely used in image processing for object 
recognition and classification, biometrics, and robotics [24]. It is designed to find and 
describe interest points at different levels of resolution, or scales (oO), of the image [25]. 
These interest points are the locations of blobs, points or areas that stand out from their 
surrounding. The scales are composed of different octaves (doubling of o ) of the image, 
where the image of each successive octave is down-sampled by 2 to reduce 
computation [26]. Within each octave, the scale space (L(x,y,o)) is calculated by 
convolving the image (/(x, y)) with Gaussian filters (G(x, y,o)) of increasing scale, as 


shown in [26] as: 





L(x, y,0) = G(x, y,o)* I(x, y), (4) 
where 
ce, 
G(x, y,0) = eee) (5) 
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Interest point locations correspond to local extrema of difference of Gaussian 


(DoG) blurred images (D(x, y,o)), given in [26] as: 
D(x, y,0) = L(x, y,ko) — L(x, y, 0), (6) 


where & is a constant multiplicative factor applied to the scale parameter. The Difference 
of Gaussian (DoG) function is a good approximation for the scale-normalized Laplacian of 
Gaussian, o’°V°G, which results in scale-invariant interest points [26]. As illustrated in 
Figure 8, the convolution with Gaussians produce the scale space images on the left [26]. 
Adjacent Gaussian images in each octave are subtracted to produce the DoG images on the 
right. 
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Figure 8. Gaussian and Difference of Gaussian (DoG) images. Source: [27]. 


Each pixel in the DoG images is compared to its 8 neighbors at the same scale and 
9 neighbors on each of the adjacent scales [27]. Local maxima or minima are potential 
interest points. Low contrast candidates and responses along edges are removed because 
of their sensitivity to noise [26]. To assign an interest point its orientation, a gradient 
orientation histogram of 36 bins from 0°- 360° is populated from its neighborhood in the 
corresponding Gaussian filtered image [27]. Then the most frequently occurring 
orientation is assigned to the interest point [26]. The interest point descriptor is generated 
by dividing the neighborhood region centered on the key point in 16 square sub-regions. 
An 8-bin histogram of gradient orientations from 0° - 360° is populated for each sub-region, 
which are then concatenated to form a 128-dimensional feature vector. Rotation invariance 
is included by rotating the gradient orientations of the descriptor by the orientation of the 
interest point, and contrast invariance is achieved by dividing every element of the vector 


by the L, norm of the vector [26]. 


4. Speeded-Up Robust Feature 


The speeded-up robust feature is scale and contrast invariant, similar to SIFT [28]. 
SURF improvements over SIFT include reduced sensitivity to noise and faster 


computational speed [28]. Interest points are detected by first calculating approximate 


12 


Hessian matrices at various scales. For a point (x,y) at scale o, the Hessian matrix is 


given in [28] as 


L(x, y,0) L(x, y,0) 
H(x,y,0)= u : (7) 
ae y,0) Li(x%y, | 
where 
a’G(x, y, 
L (2 ¥,0)= FPO 4 I(x, 9) (8) 


and similarly for L,.(x,y,o) and L,.(x,y,o). Note that the second order Gaussian 


derivative convolutions are approximated with box filter convolutions, similar to what is 


done in SIFT when the Laplacian of Gaussian is approximated. The convolution results are 


labeled with D,,, D,, and D 


xy? 


as illustrated in Figure 9. 





Figure 9. 9x 9 discrete Gaussian second order partial derivative 
convolutions and corresponding box filter convolutions. 
Adapted from [29]. 


The convolution of the image and the box filter is computationally efficient due to use of 


integral images. The integral image /(x, y),,,, obtained at any point (x, y) is the sum of all 


sum 
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pixel intensities above and to the left of that point [28]. The area of summation for the 
integral image of point D is illustrated as a blue rectangle in Figure 10. Once the integral 
image for each point in the image is calculated, it only takes a maximum of three operations 
to find the sum of intensities for a rectangle of any size. The sum of intensities of the gray 


rectangle (2) shown in Figure 10 is calculated using points A, B, C and D as 


Y=1(A),,, —1(B),, —L(C) ey FLD) oom (9) 


sum 


sum sum 





Figure 10. Integral image calculation. Adapted from [28]. 


The determinant of the resulting approximate Hessian is a blob detector, calculated 


in [28] as 


det(H pyrox) = DD, —(wD,,)’ (10) 


approx 
where the weight value w; 0.9, is selected to conserve energy between the Gaussian 
second order derivatives and the box filters [28]. Box filter size is increased in octaves to 
find interest points at different scales, and then local maxima interest points are localised 
and interpolated in both the x, y, and o dimensions. For each new octave, both the interval 
between subsequent filter sizes and sampling interval of interest points is doubled, as 
shown in Table 1, which reduces computation time [28]. The SURF descriptor is composed 


by finding the dominant orientation of gradients from a circle centered on the interest point 
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and then overlaying a square divided into 4 x 4 sections along the orientation [30]. Then 
Gaussian weighted Haar wavelet responses in the horizontal direction (d,) and vertical 
direction (d,), with respect to the interest point orientation, are summed in each section 
d,| and |d, 


b 














[28]. Polarity of intensity change information is encoded by including 


x 


resulting in a 4-dimensional vector seen in [28] as 


v=(Ya Yay 


d, 








.y'|a,)). (11) 


A 64-dimensional vector descriptor is obtained by concatenating the 4-dimensional 
vectors of the 16 sections of the overlaid square. There is an upright version of SURF (U- 
SURF) that does not encode the dominant orientation of the interest point [28]. U-SURF is 


not rotation invariant but it is faster to compute [28]. 


Table 1. | SURF octaves and corresponding filter sizes. Adapted from [28]. 





























Octave # Box Filter Sizes Filter Interval 
1 9x9, 15x15, 21x21, 27x27 6 
2 15x15, 27x27, 39x39, 51x51 12 
3 27x27, 51x51, 75x75, 99x99 24 
4 51x51, 99x99, 147x147, 195x195 48 





An illustration of a sample of SURF interest points and histogram of those interest 
points are depicted in Figure 11 and Figure 12 respectively. The size of the circles centered 
on interest points shown in Figure 11 is directly proportional to the scale at which the 
interest point was detected. The histogram graphed in Figure 12 shows that the number of 


interest points rapidly decreases as scale increases. 
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Figure 11. SURF interest points 














scale 


Figure 12. Histogram of interest points obtained from image in Figure 11 


5, K-means Clustering 


Classifiers such as SVM and random forest require all input feature vectors to be 
the same length [31]. However, the number of SURF interest points extracted from an 
image is image dependent. To fix the feature length, SURF interest points are combined by 
vector quantization [31]. First, SURF interest points are extracted from all images in the 
dataset. Then they are clustered using K-means clustering which labels each interest point 
by the index of the cluster it is assigned to [31]. The K-means algorithm starts with a group 
of k randomly selected points [32], i.e., cluster centers in the 64-dimensional feature space. 


Each data point is assigned to the nearest center. Each cluster center location is then 
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recomputed as the mean of the squared Euclidean distances of all points assigned to it [32]. 
Next, points are reassigned based off the updated locations of the cluster centers. This 
process is repeated to minimize the sum of squared Euclidean distances between points 


(x,) and their nearest cluster center (k), given in [32] as 


D(X,M)= YY! (w,-m,). (12) 


cluster k point i in 
cluster k 


Each cluster is considered a visual word, so k represents the chosen vocabulary 
size. Images are then represented by a histogram of frequencies of occurrence of the visual 
words, without regard to spatial arrangement, as illustrated in Figure 13. Note that the 
clustering process is the same for SIFT interest points. The difference is that the clustering 


takes place in a 128-dimensional feature space. 


Histogram of Barge 1 Resized 
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(a) (b) 
Figure 13. (a) Barge | resized image and (b) 5000-word histogram 





6. Visual Bag of Words 


The concept of the visual bag of words is derived from the bag of words 
representation, which was developed to facilitate text classification [33]. The bag of words 


represents a document by a vector of the frequencies of occurrence of the words that 
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comprise the text [34]. Term weighting schemes applied to bag of words have shown to be 
effective in increasing text classification accuracy, and can also be applied to VBOW 


features to increase image classification accuracy [3]. 


Term weighting is composed of three factors: term frequency (TF) factor, collection 
frequency (CF) factor, and normalizing factor [3]. The term frequency factor depends on 
the frequency of occurrence of a visual word in an image [3]. The three TF factors used in 


this work are 


Ds Raw term frequency (RAW): simply the original representation [3] 
pie Binary (BINARY) term frequency: assigns a | if the visual word is present 
and 0 if absent [3] 


5: Log term frequency (LOG) weight is defined in [3] as 
log if =log(1+ff). (13) 


The collection frequency factor includes both unsupervised and supervised weights. 
Unsupervised weights are not calculated with respect to the class of the vessel the visual 
word occurs in but supervised weights are class dependent [3]. The nine CF factors 


considered are 


1. (Unsupervised) Inverse document frequency (IDF) weighs visual words by 
a factor inversely proportional to the number of images it appears in, 


calculated in [3] as 


idf = joa): (14) 
Nn, 


WwW 


where ‘is the total number of images and “wis the number of images that 


contain visual word W. 


2. (Unsupervised) Probabilistic inverse document frequency (PIDF) is a 


variant of IDF, described in [3] as 
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N-n,, 


n 


w 





pidf =log( ). (15) 

For the following equations, note N represents the total number of images, a and 
b are the number of images of vessel type i in which visual word foccurs and is absent 
respectively, and c and d are the number of images not of vessel type i in which f occurs 


and is absent respectively. 


2 
3. (Supervised) x (CHI2) statistic assesses the level of dependence of visual 


word ! and vessel type ! and is given in [3] as 


N(ad —bc)* 


; (16) 
(a+c)\(b+d)at+b)(ct+d) 





X(t, = 


4. (Supervised) Mutual information (MI) measures the amount of 


information visual word ! has about vessel type ! and is estimated in [3] 


as 
aN 
mi(t,i) = log/——————__ ). 17) 
ep GoGEy. 
5: (Supervised) Information gain (IG) measures the reduction in entropy in a 


given vessel type if visual word ¢ is present and it is expressed in [3] as 


a aN b bN 
j i) = —xlog(—————_) + — x log ————— 
BNO a SOR aehy NS OED 

‘N d dN Co) 
c x log( : )+—x log( ). 
N (at+c)\(c+d) WN (b+d)(c+d) 








6. (Supervised) Odds Ratio (OR) describes the association between two 


values and is calculated in [3] as 


ad 
)\=-—. 19 
or(t,i) Fe (19) 
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oe (Supervised) Log Odds Ratio (LOGOR) takes the logarithm of Odds Ratio 
[3]. 

8. (Supervised) Relevance frequency (RF) weighs visual words according to 
the ratio of the number of images of vessel type ! that do and do not 


contain visual word ‘, given in [3] as 


rf (t,i) = log(2+-—-—). (20) 
max(l,c) 


9: Unit weights (NONE) are applied for comparison. 


Normalization is used to account for differences in image sizes [3]. Two techniques 


are considered: 


1. No normalization. 


2 L, -normalized TF weights. TF factors are normalized before CF factors 


are applied [3]. 


B. CLASSIFIERS 


In this section, the two classifiers used in this research, support vector machines 
and random forest are introduced and compared. The discussion includes general theory 
and specific application of both in this research. A section on decision trees is also included 


to provide the foundation for understanding the random forest algorithm. 


iF Support Vector Machine 


The support vector machine (SVM) algorithm was originally designed for 
classification of binary data sets [35]. Common applications include image classification, 
biometrics, and text categorization [36]. In contrast to the random forest, SVMs have 
numerous hyper-parameters, need modification to process multi-class data sets, and require 


cross-validation to accurately assess performance. 


To train an SVM, the data is first separated into a training set and a test set, in which 
the two classes are labelled 1 or -1. The SVM then generates a hyperplane that separates 
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the set of positive data from the set of negative data [37]. The position of the hyperplane is 
calculated to maximize the gap (margin) between the two classes by solving a constrained 
quadratic optimization problem [35]. The support vectors are the subset of training samples 


closest to the hyperplane [38], as illustrated in Figure 14. 


Support 
vector 








Support 


vector oO 


ce 


, ©) support 


vector 


Positive signs and negative signs correspond to positive and negative data 
respectively. 


Figure 14. Hyperplane and associated support vectors. Source: [38]. 


However, the training data may not be completely separable due to outliers, noise, 
or non-linear characteristics [35]. Soft margin SVMs allow some training samples to fall 
within the margin and penalize those incorrectly classified. The weight of incorrectly 
classification is called the cost and is selected by the user. The SVM can be extended to 
non-linear models by mapping the original space into a higher-dimensional feature space, 
specified by the user-chosen kernel [37]. Some common kernel options are linear, 


polynomial, exponential, and Gaussian kernels [35]. 


In this research, multiple binary soft margin SVMs were used in an error correcting 
output code (ECOC) model to classify the multi-class dataset. The six SVMs considered 
each had a unique encoding of the four vessel classes. For each SVM, one class was 


assigned as the positive data (1), one assigned as the negative data (-1), and the other two 
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were removed prior to training (0), as recorded in Table 2, where m,, is the matrix element 


corresponding to class i and SVM «x. 


Table 2. ECOC matrix 














SVM | SVM | SVM | SVM | SVM | SVM 
1 2. 3 4 5 6 
Barge 1 if 1 0 0 0 
Cargo -1 0 0 1 1 0 
Container 0 -1 0 -1 0 1 
Tanker 0 0 -1 0 -1 -1 





























During testing, each image in the test set is fed into all six SVMs. Next, each SVM 
than computes a classification score (s,) for the testing image, which is the signed distance 
from the image feature vector to the hyperplane [39]. The sign of the score corresponds to 
the sign of the predicted class. Binary loss (g(m,,,5;)) 1s a class specific measure that uses 
classification score to determine how well SVM «x classifies a testing image into the class 


[40]. The predicted class (i) of the testing image is the class which yields the minimum 


average binary loss, calculated in [40] as 


6 
A Im, |g (m,,.8;) 
i = min(= : (21) 


aa > 
i=l 

















mM. 


The ratio of matches between the predicted and actual class is the accuracy of the 


overall classifier. 


2. Decision Trees 


In classification problems, decision trees apply a sequence of tests to input features 
to assign a predicted class label [41]. The decision tree is similar to the SVM in that both 
require the user to split the data into a training set and a test set. The decision tree is built 


through recursive partitioning of the training set [42]. Unlike SVMs, decision trees can 
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natively process inputs with any combination of numerical, categorical, and ordinal 
features [43]. The major disadvantage of decision trees is that they may over-fit to the 
training dataset. A single decision tree allowed to fully grow will classify the training set 
with high accuracy, but may not be accurate when classifying the test set. This sensitivity 


or variance is due to noise or coincidental irregularities in the training set [44]. 


The technique used to build decision trees in the random forest is the Classification 
and Regression Tree (CART) algorithm [45]. CART trees are binary decision trees that are 
grown by splitting a node into two child nodes recursively, beginning with the root node 
that contains the whole training set [46]. The splitting criteria is based on Gini Impurity, 
which in this work is defined as the probability of incorrectly classifying a randomly 
chosen vessel image if it were randomly labeled one of the classes [47]. Each node has an 


associated weighted Gini Impurity, which is calculated in [47] as 
WE os 
G= aos p(i— pi), (22) 
i=l 


where nis the number of images sorted into that node, N is the total number of images in 
the training set, Cis the total number of vessel classes, and p(i) is the probability of 
randomly selecting an image belonging to class i in that node. In the dataset used in this 
research, all features are numerical, so the values of every feature are sorted from smallest 
to largest. CART then uses a greedy algorithm for feature selection, in which every value 
of every feature is tested to determine the feature ¢ and threshold value v which minimizes 
the sum of Gini Impurities of the two child nodes [48]. As a matter of convention, images 
with t<vare sent to the left child node and t>v are sent to the right. The splitting is 
continued until all images at a node belong to the same class (G =0) or if all images in a 
node have the same values for each feature [46]. These leaf nodes are labelled with the 
class of the majority. A decision tree with various features and thresholds is illustrated in 
Figure 15. The images in the test set are then run down the decision tree, and the output is 
the label of the leaf node they are sorted into. The output is compared to the true 


classification to determine the accuracy. 


23 






(0 >= 0.559754 


x131 < 4.90294 69 >= 1.70068 





x181 < 1.14768 181 >= 1.14768 
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Figure 15. Decision tree 


2: Random Forest 

Random forest, developed by Breiman in 2001 [45], is an ensemble method applied 
to decision trees. From a specific feature set D, sets D,,...,D,, are populated by 
bootstrapping from D, 1.e., selecting random samples with replacement. All sets have the 
same number of samples as D, but approximately one-third of the original data is left out 


of each bootstrapped set, which results in an “out-of-bag” data set [49]. For the feature set 
used in this research, the probability of a feature vector not being chosen for a dataset is 


calculated in [50] as 


1 1 
P(not chosen) = (1-—)” = (1-——)*” = 0.368, (23) 
n 800 
where n is the number of samples in the original feature set. 
Then each bootstrapped sample is used to train a decision tree. Feature selection 
conducted in the random forest algorithm differs from that conducted in the regular 
decision tree algorithm. Instead of considering all possible features for assignment to the 


root and internal nodes as done in basic decision trees, only a random sub-sample of 
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features f are selected for consideration at each node [45]. That random subsampling of 


features is applied to reduce correlation among the trees, and results in higher overall 


accuracy for the random forest. Previous work has shown setting f to approximately the 


square root of the total number of features produces the best results [51]. Class decision for 
an image is obtained by selecting the class decision which occurs the most frequently out 
of all trees included in the forest. To classify a new image, the feature vector of the image 
is run down every tree of the forest [22]. The class output by each model is one vote and 


the class that receives the majority of votes is returned by the forest [23]. 


The simplicity of the random forest is that it requires only two inputs: the number 


of trees grown in the forest (WM) and the number of features (f) considered at each node. 


Overall results have been found to be resistant to overfitting due to the bagging process 
used for the final decision step. In addition, unlike SVM, the data is not split by the user 
into a training set and a test set and cross-validation is not required. Each image in the “out- 
of-bag” dataset is run down the trees that did not use it in the tree construction stage, 
approximately one-third of the trees. The output class of the forest is compared to the true 
class, and the resulting out-of-bag error is the proportion of instances without match 


between true and predicted class assignments [22]. 


C; SUMMARY 


In this chapter we first discussed the feature extractors and classifiers selected for 
this study. Note that the row-by-row concatenation of cell histograms to create the final 
histogram in the HOG and LBP method encodes spatial and structural information. 
However, the concatenation limits HOG and LBP to use on images of uniform size. 
Because the clustering of SURF/SIFT interest point descriptors omits spatial information, 
the VBOW feature representation can be applied to datasets with images of non-uniform 
size. Finally, we introduced the two types of classifiers considered in this study: SVM and 
random forest. In the next chapter, we describe the parameters and order of experimentation 


followed in this research. 
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HI. EXPERIMENTAL APPROACH 


In this chapter, we describe the investigations conducted during the study to 
evaluate the performance of the random forest model in classifying the BCCT-200 ship 
images considered. The random forest classifier is first applied to resized images. The goal 
is to compare the performance of the random forest to the SVM classifier, using the HOG, 
LBP, and VBOW as feature extractors. Then variations to HOG, LBP, VBOW and the 
random forest itself are applied to optimize the accuracy of the random forest. Then an 
augmentation to the VBOW called Spatial Pyramid Matching is applied. The random forest 
classifier is then applied to the original images, using the VBOW feature representation. 
The experimental approach with original images largely follows the 2015 work by 
Parameswaran and Rainey [3], in which VBOW vocabulary size was varied and multiple 
term frequency and collection frequency weights were applied prior to being fed into an 


SVM classifier. 


A. VISUAL BAG OF WORDS AND TERM WEIGHTING 


SURF features are used to populate the VBOW. U-SUREF is applied to resized 
images as they are aligned. The rotation-invariant SURF is applied to the original images. 
Simulations show U-SURF extracted 2,310,400 features from all resized images, and 
SURF extracted 4,033,540 features from all original images. The top 80% strongest 
features were kept to populate the VBOW. The strength is calculated as the magnitude of 


the scale-normalized (o) Laplacian of the intensity of the interest point, as presented in 


[52] as 
Strength =|oV7I(x, y,0)}. (24) 


The kept features are then clustered to form the given number of visual words using the k- 
means algorithm, as discussed earlier. In our work, we allow the k-means algorithm up to 


100 iterations to converge, which was shown to be sufficient to reach convergence. 


na 


The final feature set is compiled by encoding each image as a histogram of visual 
words, and then weighted by given term frequency and collection frequency factors. 


However, the supervised CF factors ( x, IG, MI, OR, LOGOR, RF) were originally 


defined for two-class datasets [3]. As a result, we selected an extension originally proposed 
in [33] to apply to our four-class scenario; first, weights are calculated for every feature 
per class, next, the final CF factor is compiled by taking the maximum of the four weights 
for each feature [3]. The two normalization schemes are also applied to the VBOW of 
original images with the RAW and LOG TF factors, yielding three sets of weighted features 
for resized images and five sets of weighted features for original images, as outlined in 


Table 3. 


Table 3. | VBOW datasets for resized and original images 





Resized Images Original Images 
RAW TF + Various CF Weights _| RAW TF + Various CF Weights 

BINARY TF + Various CF 

Weights BINARY TF + Various CF Weights 

LOG TF + Various CF Weights LOG TF + Various CF Weights 

L2Normed RAW TF + Various CF 

Weights 


L2Normed LOG TF + Various CF Weights 


























B. RANDOM FOREST CONSTRUCTION AND ANALYSIS 


The out-of-bag error of a random forest is calculated concurrently with the growing 
of the forest. For example, there will be 100 out-of-bag error values for a 100-tree random 
forest, with each value providing the ensemble error of the forest grown to that point [53]. 
The out-of-bag error is calculated only with out-of-bag observations, which may result in 
large variances in error when only a small number of trees are grown due to the random 
nature of bootstrapping. Therefore, a large enough number of trees are grown to yield a 
stable out-of-bag error, as shown in Figure 16, which shows that error has stabilized for 


3000 trees. For features extracted using HOG and LBP, a 3000-tree random forest was 
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grown. For VBOW features, 3000-tree and 1500-tree random forests were grown for un- 


supervised CF weighted VBOW and supervised CF weighted VBOW, respectively. 








Out-of-Bag Error Excluding In-Bag Observations 
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Number of Grown Trees 


Figure 16. Out-of-bag error vs. number of trees 


To consistently compare performance of each random forest, the out-of-bag error 


of the fully grown forest is used to report classification accuracy, calculated as 
Accuracy(%) = 100* (1— (out-of-bag error)). (25) 


Recall the out-of-bag error is based on classification predictions from only about 
one-third of the total trees grown [45]. Therefore, the out-of-bag error will tend to 
overestimate the true error rate [45], so there is little risk in overestimating the accuracy of 


the Random Forest. 


C. COMPARISON OF SVM AND RANDOM FOREST PERFORMANCE ON 
RESIZED IMAGES 


The random forest is compared to two SVM models, one run in this work and one 


in a previous work by Parameswaran and Rainey [3], both with an 80%-—20% training/ 
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testing divide averaged over five runs on the resized images [1]. The SVM model in [2] 
used a linear and Gaussian kernel, but the results did not specify the specific parameters 
selected to define these kernels. HOG, HMLBP, and a SIFT populated 500-word VBOW 
were used with the previous SVM model. The SVM model used in this work uses the 
MATLAB built-in linear kernel. HOG, LBP, and a SURF populated 500-word VBOW is 
used with Random Forest and the current SVM model. The number of features extracted 
with HOG and HMLBP in Rainey et al. [2] is unknown, so HOG and LBP parameters 
selected in our work are chosen to yield approximately the same number of features, as 
presented in Table 4. The number of features randomly sampled at each level of the random 


forest (f) is calculated as 


f=|NF +05], (26) 
where F is the total number of features. 


Table 4. HOG, LBP and VBOW parameters 














Feature Sample No. of No. of sub-sampled 
Algorithm Cell Size | Points Radius | Features (F) | Features (f) 
HOG 16 x 16 N/A N/A 4896 70 
LBP 325 32 12 3 4860 70 
VBOW N/A N/A N/A 500 22 























D. HOG AND LBP 


The cell size of the HOG and LBP are varied to determine the number of features 
optimizing random forest accuracy, as shown in Table 5. The LBP number of sample points 


and radius values were kept constant. 
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Table 5. . HOG and LBP parameters 
No. of sub- 
No. of No. of sub- sampled 
HOG Features sampled LBP (Cell No. of Features 
(Cell Size) | (F) Features (f) Size) Features (F’) | (f) 
8x8 22032 148 16x16 21870 148 
12x12 9504 97 24x24 9720 99 
16x16 4896 70 32x32 4860 70 
20x20 3024 55 40x40 2835 53 


























E. APPLICATION OF VBOW AND RANDOM FOREST TO RESIZED 
IMAGES 


1. Varying Word Size and Term Weights 


RAW, BINARY, and LOG term frequency and all collection frequency weights 
were applied to a variety of vocabularies. Vocabulary sizes used were: 100, 200, 500, 1000, 
5000, 10000 and 15000 words. 


Zz Spatial Pyramid Matching 


In Huang et al., augmenting VBOW with Spatial Pyramid Matching was shown to 
increase classification accuracy for the resized images by adding structural and spatial 


information to the final histogram [8]. 


In SPM, the image is divided into sub-images of equal size 2’ x 2' segments, where 
l is referred to as the layer number starting at 0. The segments in each layer are used to 
create a VBOW. Then each image is encoded with their corresponding layer VBOW to 
create their frequency histogram. In our study, we used 3 layers, corresponding to 
1=0,1,2. Finally, histograms from these three layers were concatenated together from the 
0" to 2™ layer to create one overall histogram [8]. Note that the concatenation of the 
segment histograms and then the layer histograms happen in the same order for each final 


histogram. An illustration of a a three-layer pyramid is illustrated in Figure 17. 
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Figure 17. Illustration of three-level SPM. 


We applied two variants of SPM in our study. For both, words per segment were 


calculated so that the total vocabulary size would be close to 5000 words and therefore 


results could be compared to the 5000-word VBOW without SPM applied. In the first, 238 


words were extracted from the segments of each layer, the total number of segments was 


1+4+16=21. The resulting total number of words was 238(21) = 4998 words. In the 


second, the number of words per segment were proportional to the image size of the 


segment, calculated as 
5000 words/3 layers/1 segment ~ 1668 words/segment 


for the 0" layer, 


5000 words/3 layers/4 segments =~ 417 words/segment 


for the 1*' layer, and 


5000 words/3 layers/ 16 segments = 104 words/segment 
for the 2™ layer, totaling 


1668 words/segment(1 segment) + 417 words/segment(4 segments) + 
104 words/segment(16 segments) = 5000 words. 
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(27) 


(28) 


(29) 


(30) 


F. APPLICATION OF VBOW AND RANDOM FOREST ON ORIGINAL 
IMAGES 


Experiments were conducted on VBOW features of the following vocabulary sizes: 
100, 200, 500, 1000, 5000, 10000, and 15000. The RAW, BINARY, and LOG term 
frequencies weights and all collection frequency weights were applied. The smallest 
vocabulary size that does not reduce classification accuracy is desirable because it allows 
the decision trees to grow more quickly, thus reducing total computation time. The random 


forest performance is compared to that of the linear kernel SVM used in [3]. 


G. SUMMARY 


This chapter outlined the methodology used in feature extraction and classification. 
When random forest is applied to resized images, our goal is to investigate the viability of 
the random forest as an image classifier by comparing its performance to that obtained 
from the SVM model. Once viability is shown, changes are made to the feature extractor 
parameters to enhance the random forest accuracy. Next, the practicality of the VBOW 
feature representation and term weights were assessed. Two comparisons are conducted: 
the first between the performance of the weighted VBOW on resized and original images 
when random forest is the classifier, and the second between SVM and random forest 


classification accuracy obtained on the original images. 
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IV. EXPERIMENTAL RESULTS 


In this chapter the random forest and SVM classification accuracy obtained in the 
research is presented and analyzed. The order of presentation corresponds to the 


experimental order outlined in Chapter III. 


A. RESIZED IMAGES 
1. Random Forest and SVM 


Results show the random forest performance exceeds or is similar to that of the 
SVM used in this study, as reflected in Table 6. The accuracy of both the SVM and random 
forest used in this work exceed that of the SVM in the previous work by approximately 
10%. For HOG and LBP, the better performance may be due to the greater number of 
features extracted. For VBOW, the improved performance of SVM model used in this work 
may be due to differences in the SIFT and SURF features or different parameters in the 


classifiers themselves. 
Table 6. Random forest and SVM accuracy 


Random forest | SVM Accuracy | SVM Accuracy 
jes a= ear ae (Current) = a (Previous) [6] 


| 94.00 | 00 | 9.80 | 80 | 81.60 60 


ee HMLBP 91.25 89.80 90.80 
VBOW 84.37 85.60 76.80 





2, Varying No. of Features Extracted Using HOG and LBP 


The highest performance achieved with HOG and LBP was found to be 94.00% 
and 93.25% respectively, as shown in Table 7. Results show the accuracy of the random 
forest is inversely proportional to the cell size for LBP. The size of the HOG feature set 
which yielded the best result is 4896, while the size of the LBP feature set which yielded 
the best results is 21870. Therefore, HOG may be the more effective feature extractor when 
used with the random forest classifier. 
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Table 7. 


Random forest accuracy as HOG and LBP cell size varies 





























HOG (Cell Size + No. LBP (Cell Size + No. of 

of Features) Accuracy (%) | Features) Accuracy (%) 
8 x 8 (22032 Features) 92.00 | 16 x 16 (21870 Features) 93.25 
12 x 12 (9504 Features) 93.25 | 24 x 24 (9720 Features) 91.75 
16 x 16 (4896 Features) 94.00 | 32 x 32 (4860 Features) 91.25 
20 x 20 (3024 Feature) 93.00 | 40 x 40 (2835 Features) 89.87 





3. Varying Vocabulary Size 


Figures 18-20 present random forest accuracy achieved when applying RAW, 


BINARY, and LOG TF weights in combination with all CF weights. Results show the 


RAW and LOG TF accuracies to be similar, both sharing minimal increases in accuracy as 


the vocabulary size increases. The BINARY TF results are the most responsive to 


vocabulary size increases, ranging from 52.50% for the 100-word vocabulary to 86.75% 


for the 15000-word vocabulary. The only instance when CF factors affected the accuracy 


for all three TF was for the 100-word VBOW, where OR and LOGOR CF weights reduced 


accuracy 3 - 5%. Elsewhere, their influence was shown to be minimal. The optimal 


vocabulary size selected for all three was 10000 and 15000 words, yielding similar 


accuracies for all three. These findings indicate that the random forest may be classifying 


images effectively based solely on the presence or absence of visual words, not on their 


frequency of occurrence in the image when the vocabulary size is sufficiently large. 
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Figure 18. Random forest results with RAW TF and all CF weighted features 
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Figure 19. Random forest results with BINARY TF and all CF weighted 
features 
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Figure 20. Random forest results with LOG TF and all CF weighted features 


4. Spatial Pyramid Matching 


Results show the addition of spatial information through the SPM consistently 
increased the accuracy of the random forest, as shown in Table 8. The accuracies increased 
anywhere from 1—3%. Both variants of the SPM considered in this study performed 


similarly. 


Table 8. Random forest results with SPM and without SPM augmentation 
to 5000-word VBOW 


























SPM (Proportional No. | SPM (Equal No. of 

of Words/ Segment) Words/Segment) No SPM 

Accuracy (%) Accuracy (%) Accuracy (%) 
RAW-NONE 88.00 88.87 86.12 
RAW-IDF 88.00 87.62 86.75 
RAW-PIDF 88.75 88.62 85.12 
RAW-CHI2 88.50 88.00 85.62 
RAW-IG 88.12 88.12 86.62 
RAW-MI 88.12 88.25 84.50 
RAW-OR 88.62 88.50 Boslo 
RAW- 
LOGOR 88.50 87.50 85.00 
RAW-RF 88.25 87.50 85.25 
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B. ORIGINAL IMAGES 


Figures 21, 23, and 25 show random forest results of all three TF weights, with no 
normalization and all CF weights applied. Figures 22, 24, and 26 show corresponding SVM 
results obtained in [3]. Results show the application of RAW and LOG TF weights resulted 
in much lower accuracies for the random forest than those obtained with the SVM. For the 
random forest, there was minimal change in accuracy level as vocabulary size varied, but 
1000-word vocabularies yielded the highest accuracies. In contrast to the SVM results, 
random forest performance with all three TF weights was not affected by the application 


of any CF weights, un-supervised or supervised. 


Simulations show that for the random forest, the BINARY TF weighted VBOW 
results are the most responsive to vocabulary size increases and yielded comparable results 
to the SVM for the large vocabulary sizes (10000 and 15000). The similar behavior of the 
BINARY TF weighted VBOW results for both resized and original images also provides 
further evidence that the random forest is effectively classifying images based solely on 


the presence or absence of visual words when the vocabulary size is large enough. 


RAW TF WEIGHT + VARIOUS CF WEIGHTS 
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Figure 21. Random forest results with RAW TF and all CF weighted features 


SRALRSRSS 


66.00 


————— Ee 66.00 


s anone 
ABmsosane eg9858c39 
34245 


65,50 


VSSLO2 Goo 333 ga938 


rerer 
~ 2 

es 
il | il) | | 


NUMBER OF WORDS 


mer 65.50 


5 
—_—_—_—_—_—_—_—__ sob UT 
- a 65.62 
C a Gf, 5 7 
C4}? 
— es 6) 
——— 
es ff ()) 
ee (5().)' 
SS 7 25 
—Ee 65 75 
A 6525 
66 (10 
Ss 66 50 
Sere fA 7 





ACCURACY (%) 
A 3.75 
a 63 75 

a 63.62 
So ee 64,00 


39 


ACCURACY (%) 





Raw TF Term Weights + Various Collection Frequency Weights 
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Figure 22. SVM results with RAW TF and all CF weighted features. 
Source: [3]. 
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Figure 23. Random forest results with BINARY TF and all CF weighted 


features 
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Binary Term Weights + Various Collection Frequency Weights 
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Figure 24. SVM results with BINARY TF and all CF weighted features. 


Source: [3]. 


LOG TF WEIGHT + VARIOUS CF WEIGHTS 
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Figure 25. Random forest results with LOG TF and all CF weighted features. 
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Log Term Frequency Term Weights + Various Collection Frequency Weights 
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Figure 26. SVM results with LOG TF and all CF weighted features. 
Source: [3]. 


Figures 27 and 29 show classification performances obtained for the random forest 


classifier using the RAW and LOG TF weighted VBOW, with L, normalization and all CF 


weights applied. Figures 28 and 30 show corresponding SVM results obtained in [3]. 
Results show normalization has little effect on the random forest performance. The SVM 
clearly outperforms the random forest for all vocabulary sizes. The only case random forest 
outperforms the SVM is when the CHI2 CF weight is applied to the 10000 and 15000 word 
vocabularies. The unsupervised and supervised CF weights again have minimal effect on 


the random forest accuracy. 
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NORMALIZED RAW TF WEIGHT + VARIOUS CF 
WEIGHTS 


BRAW-NONE @RAWADF @RAW-PIDF BRAW-CHIZ BRAWAIG BRAW-MI BRAW-OR BRAW-LOGOR BRAW-RFE 


ne Sng rahere ala efor Morisey Bre Se8R Ps 
aeeae mae produ ten Sones 3ges cre ere creer ive saageenea Bade3 ai movuery 
gag3 


il $3 


100 200 1000 5000 10000 15000 
NUMBER OF WORDS 


Figure 27. Random forest results with L, normed RAW TF and all CF 
weighted features 
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Figure 28. SVM results with L,normed RAW TF and all CF weighted 
features. Source: [3]. 
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NORMALIZED LOG TF WEIGHT + VARIOUS CF 
WEIGHTS 
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Figure 29. Random forest results with L, normed LOG TF and all CF 
weighted features 


Log Term Frequency Term Weights + Various Collection Frequency Weights 
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Figure 30. SVM results with L,normed LOG TF and all CF weighted 
features. Source: [3]. 


C. SUMMARY 


In this chapter we presented the classification accuracies obtained from the SVM 
and random forest classifier when applied to the resized and original images. The random 


forest outperformed the SVM when applied to resized images but greatly underperformed 
44 


when applied to the original images. We discuss conclusions drawn from these results and 


recommendations for future work in the next chapter. 
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V. CONCLUSIONS AND RECOMMENDATIONS 


Due to the greater availability of satellite imagery, ship detection and classification 
is an increasingly popular area of study [1]. Contributing to that body of work, we 
investigated the effectiveness of the random forest algorithm in conjunction with various 
feature extractors in classifying vessel types. The random forest ease of use and 
computational efficiency make it an attractive classifier algorithm, and this research 
provides further evidence for the baseline viability of the random forest in the ship 


classification task. 


Results show the highest accuracies were achieved with features extracted using 
the HOG and LBP algorithms. Additionally, we noticed consistent accuracy improvement 
in the SPM augmented VBOW. These results appear to be strong evidence that encoding 
spatial and structural information in the feature vectors enhances the performance of the 


random forest. 


Simulation results show there is a great difference in accuracy between the VBOW 
representation of resized and original images, especially when the RAW and LOG TF were 
applied. Results strongly suggest that random forest yields higher accuracies from vessel 
images of the same size and orientation. In contrast to the VBOW-SVM results, the 
VBOW-random forest results consistently showed little influence of collection frequency 


factors and normalization, suggesting limited applicability of these weighting schemes. 


The VBOW-random forest performance was most responsive to vocabulary size 
when the Binary TF factor was applied, with the largest vocabularies yielding the highest 
accuracies. This finding suggests that the random forest effectively characterizes images 
based on the presence or absence of a visual word in a class. Results suggest the only viable 
feature representation for original images considered in this study is the BINARY TF 
weighted VBOW, as it was the only feature representation that yielded accuracies greater 
than 80%. There is ample opportunity for following research to optimize random forest 


performance and configure it for military applications. 
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Recall random forest can natively process any combination of numerical, 
categorical and ordinal data. The algorithm is thus uniquely suited to classify images based 
off features extracted from different algorithms or different sensors. Two possible 
examples of heterogeneous feature vectors are those that are a combination of features 
extracted from images and image metadata, and feature vectors that are composed of 


features from local and global extractors. 


In this study the VBOW was formed by K-means clustering of the top 80% 
strongest SURF descriptors. Alternative descriptors such as SIFT can be used to form the 
VBOW. Also, varying the descriptor strength threshold used in forming the VBOW may 
improve the performance of the random forest. Visual word generation may be improved 
through alternate clustering methods such as hierarchical clustering, Gaussian mixture 


models, and self-organizing maps [54]. 
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